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A sparse precision matrix can be directly translated into a sparse 
Gaussian graphical model under the assumption that the data follow 
a joint normal distribution. This neat property makes high-dimen- 
sional precision matrix estimation very appealing in many applica- 
tions. However, in practice we often face nonnormal data, and variable 
transformation is often used to achieve normality. In this paper we 
consider the nonparanormal model that assumes that the variables 
follow a joint normal distribution after a set of unknown monotone 
transformations. The nonparanormal model is much more flexible 
than the normal model while retaining the good interpretability of the 
latter in that each zero entry in the sparse precision matrix of the non- 
paranormal model corresponds to a pair of conditionally independent 
variables. In this paper we show that the nonparanormal graphical 
model can be efficiently estimated by using a rank-based estimation 
scheme which does not require estimating these unknown transfor- 
mation functions. In particular, we study the rank-based graphical 
lasso, the rank-based neighborhood Dantzig selector and the rank- 
based CLIME. We establish their theoretical properties in the setting 
where the dimension is nearly exponentially large relative to the sam- 
ple size. It is shown that the proposed rank-based estimators work 
as well as their oracle counterparts defined with the oracle data. Fur- 
thermore, the theory motivates us to consider the adaptive version 
of the rank-based neighborhood Dantzig selector and the rank-based 
CLIME that are shown to enjoy graphical model selection consis- 
tency without assuming the irrepresentable condition for the oracle 
and rank-based graphical lasso. Simulated and real data are used to 
demonstrate the finite performance of the rank-based estimators. 
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1. Introduction. Estimating covariance or precision matrices is of fun- 
damental importance in multivariate statistical methodologies and appli- 
cations. In particular, when data follow a joint normal distribution, X = 
(Xi, . . . ,Xp) ^ Np{fi,'S), the precision matrix = can be directly 
translated into a Gaussian graphical model. The Gaussian graphical model 
serves as a noncausal structured approach to explore the complex systems 
consisting of Gaussian random variables, and it finds many interesting ap- 
plications in areas such as gene expression genomics and macroeconomics 
determinants study [Friedman (2004), Wille et al. (2004), Dobra, Eicher and 
Lenkoski (2010)]. The precision matrix plays a critical role in the Gaussian 
graphical models because the zero entries in = {6ij)pxp precisely cap- 
ture the desired conditional independencies, that is, 6ij = if and only if 
XiALXj\'X\{Xi,Xj} [Lauritzen (1996), Edwards (2000)]. 

The sparsity pursuit in precision matrices was initially considered by 
Dempster (1972) as the covariance selection problem. Multiple testing meth- 
ods have been employed for network exploration in the Gaussian graphi- 
cal models [Drton and Perlman (2004)]. With rapid advances of the high- 
throughput technology (e.g., microarray, functional MRI), estimation of 
a sparse graphical model has become increasingly important in the high- 
dimensional setting. Some well-developed penalization techniques have been 
used for estimating sparse Gaussian graphical models. In a highly-cited pa- 
per, Meinshausen and Biihlmann (2006) proposed the neighborhood selec- 
tion scheme which tries to discover the smallest index set nsa for each vari- 
able Xa satisfying X^ JL X \ {X^ , X„e^ } | X„e^ . Meinshausen and Biihlmann 
(2006) further proposed to use the lasso [Tibshirani (1996)] to fit each neigh- 
borhood regression model. Afterwards, one can summarize the zero patterns 
by aggregation via union or intersection. Yuan (2010) considered the Dantzig 
selector [Candes and Tao (2007)] as an alternative to the lasso penalized 
least squares in the neighborhood selection scheme. Peng et al. (2009) pro- 
posed the joint neighborhood lasso selection. Penalized likelihood methods 
have been studied for Gaussian graphical modeling [Yuan and Lin (2007)]. 
Friedman, Hastie and Tibshirani (2008) developed a fast blockwise coordi- 
nate descent algorithm [Banerjee, El Ghaoui and d'Aspremont (2008)] called 
graphical lasso for efficiently solving the lasso penalized Gaussian graphical 
model. Rate of convergence under the Frobenius norm was established by 
Rothman et al. (2008). Ravikumar et al. (2011) obtained the convergence 
rate under the elementwise ioo norm and the spectral norm. Lam and Fan 
(2009) studied the nonconvex penalized Gaussian graphical model where a 
nonconvex penalty such as SCAD [Fan and Li (2001)] is used to replace the 
lasso penalty in order to overcome the bias issue of the lasso penalization. 
Zhou et al. (2011) proposed a hybrid method for estimating sparse Gaussian 
graphical models: they first infer a sparse Gaussian graphical model struc- 
ture via thresholding neighborhood selection and then estimate the precision 
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Table 1 

Testing for normality of the gene expression measurements in the Arabidposis thaliana 
data. This table illustrates the number out of 39 genes rejecting the null hypothesis of 
normality at the significance level of 0.05 





Critical value 


Cramer-von Mises 


Lilliefors 


Shapiro-Francia 


Raw data 


0.05 


30 


30 


35 




0.05/39 


24 


26 


28 


Log data 


0.05 


29 


24 


33 




0.05/39 


11 


12 


16 



matrix of the submodel by maximum likelihood. Cai, Liu and Luo (2011) re- 
cently proposed a constrained ii minimization estimator called CLIME for 
estimating sparse precision matrices and established its convergence rates 
under the elementwise ioo norm and Frobenius norm. 

Although the normality assumption can be relaxed if we only focus on 
estimating a precision matrix, it plays an essential role in making the neat 
connection between a sparse precision matrix and a sparse Gaussian graph- 
ical model. Without normality, we ought to be very cautious when trans- 
lating a good sparse precision matrix estimator into an interpretable sparse 
Gaussian graphical model. However, the normality assumption often fails 
in reality. For example, the observed data are often skewed or have heavy 
tails. To illustrate the issue of nonnormality in real applications, let us con- 
sider the gene expression data to construct isoprenoid genetic regulatory 
network in Arabidposis thaliana [Wille et al. (2004)], including 16 genes 
from the mevalonate (MVA) pathway in the cytosolic, 18 genes from the 
plastidial (MEP) pathway in the chloroplast and 5 encode proteins in the 
mitochondrial. This dataset contains gene expression measurements of 39 
genes assayed on n = 118 Affymetrix GeneChip microarrays. This dataset 
was analyzed by Wille et al. (2004), Li and Gui (2006) and Drton and Perl- 
man (2007) in the context of Gaussian graphical modeling after taking the 
log-transformation of the data. However, the normality assumption is still 
inappropriate even after the log-transformation. To show this, we conduct 
the normality test at the significance level of 0.05 as in Table 1. It is clear 
that at most 9 out of 39 genes would pass any of three normality tests. 
Even after log-transformation, at least 60% genes reject the null hypothesis 
of normality. With Bonferroni correction there are still over 30% genes that 
fail to pass any normality test. Figure 1 plots the histograms of two key iso- 
prenoid genes MECPS in the MEP pathway and MK in the MVA pathway 
after the log-transformation, clearly showing the nonnormality of the data 
after the log-transformation. 

Using transformation to achieve normality is a classical idea in statistical 
modeling. The celebrated Box-Cox transformation is widely used in regres- 
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Fig. 1. Illustration of the nonnormality after the log-transformation preprocessing. 



sion analysis. However, any parametric modeling of the transformation suf- 
fers from model mis-specification which could lead to misleading inference. 
In this paper we take a nonparametric transformation strategy to handle the 
nonnormality issue. Let F(-) be the CDF of a continuous random variable 
X and be the inverse of the CDF of A^(0, 1). Consider the transfor- 

mation from X to Z hy Z = ^-^{F{X)). Then it is easy to see that Z is 
standard normal regardless of F. Motivated by this simple fact, we consider 
modeling the data by the following nonparanormal model: 

The nonparanormal model: X = (Xi, . . . ,Xp) follows a p-dimensional non- 
paranormal distribution if there exists a vector of unknown univariate mono- 
tone increasing transformations, denoted by f = (/i, . . . such that the 
transformed random vector follows a multivariate normal distribution with 
mean and covariance 5], 

(1) f (X) = . . . , fpiXp)) ~ iVp(0, E), 

where without loss of generality the diagonals of S are equal to 1. 

Note that model (1) implies that fj{Xj) is a standard normal random vari- 
able. Thus, fj must be <1>~^ o Fj where Fj is the CDF of Xj. The marginal 
normality is always achieved by transformations, so model (1) basically as- 
sumes that those marginally normal-transformed variables are jointly nor- 
mal as well. We follow Liu, Lafferty and Wasserman (2009) to call model (1) 
the nonparanormal model, but model (1) is in fact a semiparametric Gaus- 
sian copula model. The semiparametric Gaussian copula model is a nice 
combination of flexibility and interpretability, and it has generated a lot of 
interests in statistics, econometrics and finance; see Song (2000), Chen and 



RANK-BASED ESTIMATION OF NONPARANORMAL GRAPHICAL MODELS 5 



Fan (2006), Chen, Fan and Tsyrennikov (2006) and references therein. Let 
Z = (Zi, . . . , Zp) = (/i(Xi), . . . , fp{Xp)). By the joint normahty assumption 
of Z, we know that 6ij = if and only if Zi AL Zj\7i\{Zi, Zj}. Interestingly, 
we have that 

Z,ALZj\Z\ {Z,, Zj} ^ X,AL Xj |X \ {X,,Xj}. 

Therefore, a sparse can be directly translated into a sparse graphical 
model for presenting the original variables. 

In this work we primarily focus on estimating which is then used to con- 
struct a nonparanormal graphical model. As for the nonparametric transfor- 
mation function, by the expression fj = ^~^oFj, we have a natural estimator 
for the transformation function of the jth variable as fj = o Fj~ where 

Fj' is a Winsorized empirical CDF of the jth variables. Note that the Win- 
sorization is used to avoid infinity value and to achieve better bias- variance 
tradeoff; see Liu, Lafferty and Wasserman (2009) for detailed discussion. In 
this paper we show that we can directly estimate without estimating these 
nonparametric transformation functions at all. This statement seems to be a 
bit surprising because a natural estimation scheme is a two-stage procedure: 
first estimate fj and then apply a well-developed sparse Gaussian graphical 
model estimation method to the transformed data Zj = f (xj), 1 <i <n. Liu, 
Lafferty and Wasserman (2009) have actually studied this "plug-in" estima- 
tion approach. They proposed a Winsorized estimator of the nonparametric 
transformation function and used the graphical lasso in the second stage. 
They established convergence rate of the "plug-in" estimator when p is re- 
stricted to a polynomial order of n. However, Liu, Lafferty and Wasserman 
(2009) did not get a satisfactory rate of convergence for the "plug-in" ap- 
proach, because the rate of convergence can be established for the Gaus- 
sian graphical model even when p grows with n almost exponentially fast 
[Ravikumar et al. (2011)]. As noted in Liu, Lafferty and Wasserman (2009), 
it is very challenging, if not impossible, to push the theory of the "plug-in" 
approach to handle exponentially large dimensions. One might ask if using 
a better estimator for the transformation functions could improve the rate 
of convergence such that p could be allowed to be nearly exponentially large 
relative to n. This is a legitimate direction for research. We do not pursue 
this direction in this work. Instead, we show that we could use a rank-based 
estimation approach to achieve the exact same goal without estimating these 
transformation functions at all. 

Our estimator is constructed in two steps. First, we propose using the 
adjusted Spearman's rank correlation to get a nonparametric sample es- 
timate of 5]. As the second step, we compute a sparse estimator from 
the rank-based sample estimate of S. For that purpose, we consider several 
regularized rank estimators, including the rank-based graphical lasso, the 
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rank-based neighborhood Dantzig selector and the rank-based CLIME. The 
complete methodological details are presented in Section 2. In Section 3 
we establish theoretical properties of the proposed rank-based estimators, 
regarding both precision matrix estimation and graphical model selection. 
In particular, we are motivated by the theory to consider the adaptive ver- 
sion of the rank-based neighborhood Dantzig selector and the rank-based 
CLIME, which can select the true support set with an overwhelming proba- 
bility without assuming a stringent irrepresentable condition required for the 
oracle and rank-based graphical lasso. Section 4 contains numerical results 
and Section 5 has some concluding remarks. Technical proofs are presented 
in an Appendix. 

A referee informed us in his/her review report that Liu et al. (2012) also 
independently used the rank-based correlation in the context of nonpara- 
metric Gaussian graphical model estimation. A major focus in Liu et al. 
(2012) is the numerical demonstration of the robustness property of the 
rank-based methods using both Spearman's rho and Kendall's tau when 
data are contaminated. In the present paper we provide a systematic anal- 
ysis of the rank-based estimators, and our theoretical analysis further leads 
to the rank-based adaptive Dantizg selector and the rank-based adaptive 
CLIME in order to achieve improved sparsity recovery properties. Our theo- 
retical analysis of the rank-based adaptive Dantizg selector is of independent 
interest. Although the theory is established for the rank-based estimators 
using Spearman's rho, the same analysis can be easily adopted to prove the 
theoretical properties of the rank-based estimators using Kendall's tau rank 
correlation. 

2. Methodology. We first introduce some necessary notation. For a ma- 
trix A = (aij), we define its entry- wise ii norm as ||A||i = ^^-^ ^-^ |ajj|, and its 
entry-wise ioo norm as ||A||max = ™ax(j j) \aij\. For a vector v = {vi, . . . ,vi), 
we define its £i norm as ||v||^^ = and its ioo norm as ||v||f^ = 

maxj \vj\. To simplify notation, define M.a,b as the sub-matrix of M with 
row indexes A and column indexes B, and define as the sub-vector of v 
with indexes A. Let (k) be the index set {1, . . . , A; — 1, A; -|- 1, . . . ,p}. Denote 
by '^(k) — ^(fc),(fc) the sub-matrix of SI with both A;th row and column re- 
moved, and denote by (t^;.) = '^[k),k the vector including all the covariances 
associated with the A:th variable. In the same fashion, we can also define 
d{k), and so on. 

2.1. The "oracle" procedures. Suppose an oracle knows the underlying 
transformation vector; then the oracle could easily recover "oracle data" by 
applying these true transformations, that is, Zj = f(xj),l < i < n. Before 
presenting our rank-based estimators, it is helpful to revisit the "oracle" 
procedures that are defined based on the "oracle data." 
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• The oracle graphical lasso. Let S be the sample covariance matrix for 
the "oracle" data, and then the "oracle" log-profile-likelihood becomes 
logdet(0) — tr(S 0). The "oracle" graphical lasso solves the following li 
penalized likelihood problem: 

(2) min-logdet(e) + tr(5]°e) + |%|. 

• The oracle neighborhood lasso selection. Under the nonparanormal model, 
for each /c = 1, . . . ,p, the "oracle" variable given Z^^,) is normally dis- 
tributed as A^(Z^,^S^^^<T(fc), 1 — cr^^5]^^(7(;;)), which can be written as 

Zk = ^Jk)Pk + e/c with (3k = ^1k)^(k) and Ek ~ A^(0, 1 - a'^j.^T.'^^^a- i^k)) ■ 
Notice that fik and £k are closely related to the precision matrix 0, that 
is, Okk = 1/Var(efc) and = — /S^/ Var(efc). Thus for the /cth variable, 
and (3k share the same sparsity pattern. Following Meinshausen and 
Biihlmann (2006), the oracle neighborhood lasso selection obtains the so- 
lution (3^. from the following lasso penalized least squares problem: 

1 " 

(3) min -^(z,,-z5,)/3f + A||/3||,„ 

2=1 

and then the sparsity pattern of can be estimated by aggregating the 
neighborhood support set of (3k = {(3°f^)jj^k ijick = {j : f^°k / 0}) via inter- 
section or union. 

We notice the fact that 



1 " 



n 



z,k - ^lk)Pf = /3^5](,)/3 - 2/3^^^,) + al^. 



Then (3) can be written in the following equivalent form: 
(4) niin ^/3^S(,)/3 - 2/3^^^,) + X\\(3\W. 



The oracle neighborhood Dantzig selector. Following Yuan (2010) the lasso 
least squares in (3) can be replaced with the Dantzig selector 



(5) min \\(3\\i^ subject to 



p-i 



1 " 



n 

i=l 



< A. 



Then the sparsity pattern of can be similarly estimated by aggregating 
via intersection or union. Furthermore, we notice that 

1 " 

-^Hk)i^Iik)P - ^ik) = '^{k)f3 - ^Ik)- 
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Then (5) can be written in the following equivalent form: 

(6) /^^p"^! "^"^1 subject to ||5]°fc)/3 - o-^fc)!!^^ < ^• 

• The oracle CLIME. Following Cai, Liu and Luo (2011) we can estimate 
precision matrices by solving a constrained ii minimization problem, 

(7) argmin||©||i subject to ||S © — Illmax — 

Cai, Liu and Luo (2011) compared the CLIME with the graphical lasso, 
and showed that the CLIME enjoys nice theoretical properties without 
assuming the irrepresentable condition of Ravikumar et al. (2011) for the 
graphical lasso. 

2.2. The proposed rank-based estimators. The existing theoretical results 
in the literature can be directly applied to these oracle estimators. However, 
the "oracle data" zi, Z2, . . . , z„ are unavailable and thus the above-mentioned 
"oracle" procedures are not genuine estimators. Naturally we wish to con- 
struct a genuine estimator that can mimic the oracle estimator. To this 
end, we can derive an alternative estimator of 5] based on the actual data 
xi,X2, . . . ,x„ and then feed this genuine covariance estimator to the graph- 
ical lasso, the neighborhood selection or CLIME. To implement this nat- 
ural idea, we propose a rank-based estimation scheme. Note that S can 
be viewed as the correlation matrix as well, that is, aij = corr(zj, Zj). Let 
{xii,X2i, . . . , Xni) be the observed values of variable Xi. We convert them to 
ranks denoted by rj = {rii,r2i, ■ . ■ ,rni). Spearman's rank correlation fjj is 
defined as Pearson's correlation between and r^. Spearman's rank corre- 
lation is a nonparametric measure of dependence between two variables. It 
is important to note that are the ranks of the "oracle" data. Therefore, 
fjj is also identical to the Spearman's rank correlation between the "oracle" 
variables Zi,Zj. In other words, in the framework of rank-based estimation, 
we can treat the observed data as the "oracle" data and avoid estimating p 
nonparametric transformation functions. We make a note here that one may 
consider other rank correlation measures such as Kendall's tau correlation. 
To fix the idea we use Spearman's rank correlation throughout this paper. 

The nonparanormal model implies that {Zi,Zj) follows a bivariate normal 
distribution with correlation parameter cjjj. Then a classical result due to 
Kendall (1948) [see also Kruskal (1958)] shows that 

(8) lim E(fjj) = -arcsinf ^cjjj ) , 

which indicates that fij is a biased estimator of 0"^. To correct the bias, 
Kendall (1948) suggested using the adjusted Spearman's rank correlation 
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Combining (8) and (9) we see that rfj is an asymptotically unbiased esti- 
mator of aij. Naturally we define the rank-based sample estimate of S as 
follows: 

In Section 3 we show is a good estimator of I]. Then we naturally come 
up with the following rank-based estimators of by using the graphical 
lasso, the neighborhood Dantzig selector and CLIME: 

• The rank-based graphical lasso: 

(10) 0g = argmin-logdet(0) +tr(R'0) + A 

• The rank-based neighborhood Dantzig selector: A rank-based estimate of 

can be solved by 

(11) ^fc'"^ = argmin||^||^^ subject to ||Rffc)/3 - ff^^J^^ < A. 

. 1 r 1 r '^s.nd '\s.nd 

The support of can be estimated from the support of fj^ j • • • > Hp 
via aggregation by union or intersection. We can also construct the rank- 
based precision matrix estimator 0„^ = (^fj"'^)i<j,j<p with 

^kk - \\Pk ) ^(k)Pk - ^Pk ) ^{k) + ^) 

and 



^{k) — -^kk P 



k 

{k = 1, . . . ,p). We can symmetrize 0„^ by solving the following optimiza- 
tion problem [Yuan (2010)]: 

®nd = argmin||0 - 0,^,^11^^. 

Theoretical analysis of the rank-based neighborhood Dantzig selector in 
Section 3.2 motivated us to consider using the adaptive Dantzig selec- 
tor in the rank-based neighborhood estimation in order to achieve better 
graphical model selection performance. See Section 3.2 for more details. 
• The rank-based CLIME: 

(12) 0^ = argmin|[0||i subject to ||R''0 - < A. 



Let e^'s be the natural basis in MP. By Lemma 1 in Cai, Liu and Luo 
(2011) the above optimization problem can be further decomposed into p 
subproblems of vector minimization, 

(13) = argmin ll^llf^ subject to ||R**0 — e^H^ < A, 
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^ S S .C s .c 

for k = 1, . . . ,p. Then 0^ is exactly equivalent to {Oi , . . . ,6p ). Note that 
0(, could be asymmetric. Following Cai, Liu and Luo (2011) we consider 

with <9|/ = ^|/-f{|gs,c|<|gs.c|} + ^|f/||gs.c|>|gs.c||. In the original paper Cai, 

Liu and Luo (2011) proposed to use hard thresholding for graphical model 
selection. Borrowing the basic idea from Zou (2006), we propose an adap- 
tive version of the rank-based CLIME in order to achieve better graphical 
model selection. See Section 3.3 for more details. 



2.3. Rank-based neighborhood lasso? One might consider the rank-based 
neighborhood lasso defined as follows: 

(14) ^niin^/3^R^,)/3-2/3^f^,) +A||/3||,,. 



However, there is a technical problem for the above definition. The Spear- 
man's rank correlation matrix R is always positive semidefinite, but the 
adjusted correlation matrix could become indefinite. To our best knowl- 
edge, Devlin, Gnanadesikan and Kettenring (1975) were the first to point 
out the indefinite issue of the estimated rank correlation matrix. Here we 
also use a toy example to illustrate this point. Consider the 3x3 correlation 
matrix 



1 


0.7 





0.7 


1 


0.7 





0.7 


1 



Note that A is positive-definite with eigenvalues 1.99, 1.00 and 0.01, but 
2sin(^A) becomes indefinite with eigenvalues 2.01, 1.00 and —0.01. The 
negative eigenvalues will make (14) an ill-defined optimization problem. 
Fortunately, the positive definite issue does not cause any problem for the 
graphical lasso, Dantzig selector and CLIME. Notice that the diagonal ele- 
ments of are obviously strictly positive, and thus Lemma 3 in Raviku- 
mar et al. (2011) suggests that the rank-based graphical lasso always has 
a unique positive definite solution for any regularization parameter A > 0. 
The rank-based neighborhood Dantzig selector and the rank-based CLIME 
are still well defined, even when R*;,) becomes indefinite, and the according 

optimization algorithms also tolerate the indefiniteness of R-^/-) • One might 

consider a positive definite correction of R* for implementing the neighbor- 
hood lasso estimator. However, the resulting estimator shall behave similarly 
to the rank-based neighborhood Dantzig selector because the lasso penalized 
least squares and Dantzig selector, in general, work very similarly [Bickel, 
Ritov and Tsybakov (2009), James, Radchenko and Lv (2009)]. 
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3. Theoretical properties. For a vector v — [vi, . . . ,vi), let llvUmin de- 
note the minimum absolute value, that is, ||v||min = miuj \ vj\. For a matrix 
A = {aij)kxh we define the following matrix norms: the matrix ii norm 
II A||£^ = maxj \aij\, the matrix i^o norm || A||£^ = maxj |aij| and the 
Frobenius norm HAHj? = j) ^ij)^^^- ^'^^ symmetric matrix, its ma- 
trix i \ norm coincides its matrix £qq norm. Denote by Ainin(A) and Amax(A) 
the smallest and largest eigenvalues of A, respectively. Define as the 
true covariance matrix, and let 0* be its inverse. Let A be the true sup- 
port set of the off-diagonal elements in 0*. Let d = maxj'^-^j I^Q^^Qj be 
the maximal degree over the underlying graph corresponding to ©*, and let 
s = ^{e*.^o} be the total degree over the whole graph. 

In this section we establish theoretical properties for the proposed rank- 
based estimators. The main conclusion drawn from these theoretical results 
is that the rank-based graphical lasso, neighborhood Dantzig selector and 
CLIME work as well as their oracle counterparts in terms of the rates of 
convergence. We first provide useful concentration bounds concerning the 
accuracy of the rank-based sample correlation matrix. 

Lemma 1. Fix any < e < 1, and let n > Then there exists some 
absolute constant cq > 0, and we have the following concentration bounds: 

Pr(|f|j - aij \ > e) < 2exp(-cone^), 
Pr(||R^ - S|U,, > e) < /exp(-cone2). 

Lemma 1 is a key ingredient of our theoretical analysis. It basically shows 
that the rank-based sample estimator of I] works as well as the usual sample 
covariance estimator of 5] based on the "oracle data." 

3.1. Rank-based graphical lasso. Denote by Vmin = |^*j| the 

minimal entry of 0* in the absolute scale. Define K^:* = H^^^^lkoo 
Ku* = ||(H^^)~^||^^. Define H* as the Kronecker product S* E*. 

Theorem 1. Assume ||H^c_4(H^^)~'^||£_^ < 1 — k /or k € (0, 1). 
(a) Element-wise maximal bound: if A is chosen such that 

1 1 

^ 6(l+K/4)iv:s*ii'H*max{l,(l+4/K)K2.KH*} ' d' 

with probability at least 1 — exp(— j^coraA^), the rank-based graphical lasso 
estimator 0^ satisfies that O^-^ = for any € A'^ and 

l|0^-0*IUax<2i^H.(l + ^)A. 
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(b) Graphical model selection consistency: picking a regularization param 
eter A to satisfy that 



A < mill 



2(1 + k/4)Ku* ' 6(1 + K/4)i('s* • max{l, (1 + 4/k)K2 Kh*} 



then with probability at least 1 — exp(— jg coraA^), 0^ is sign consistent 
satisfying t 
{i,j)£A^. 



16" 

satisfying that sign{6^j^) = sign{6*j) for any {i,j) € A and 6^j^ = for any 



In Theorem 1, the condition ||H^c_4(H^_^)~"^ < 1 — k is also referred 
as the irrepresentable condition for studying the theoretical properties of the 
graphical lasso [Ravikmnar et al. (2011)]. We can obtain a straightforward 
understanding of Theorem 1 by considering its asymptotic consequences. 

Corollary 1. Assume that there is a constant n E (0,1) such that 
||H^c_4(H^_^)~^ 11^^ <1 — K. Suppose that Ky,* and K^* are both fixed con- 
stants. 

(a) Rates of convergence: assume cP log and pick a regularization 
parameter A such that ;» A = 0{{logp/n)^/'^) . Then we have 



|0;-0*Lax = Op 



logp 



Furthermore, the convergence rates in both Frobenius and matrix ii-norms 
can also be obtained as follows: 



eg-e*\\^ = Op 



(s+p) logp 



n 



0,-0*11,, = Op 



min{s + p, d^} logp 



n 



(b) Graphical model selection consistency: assume V'min is also fixed and 
n ^ d^logp. Pick a A such that d~^ ^ A = 0((log p/n)^/^). Then we have 
s\gn{elf) = sign(0^): V(i,i) G A and sign(^^/) = 0, V(i, j) G A^. 

Under the same conditions of Theorem 1 and Corollary 1, by the results 
in Ravikumar et al. (2011), we know that the conclusions of Theorem 1 and 
Corollary 1 hold for the oracle graphical lasso. In other words, the rank- 
based graphical lasso estimator is comparable to its oracle counterpart in 
terms of rates of convergence. 

3.2. Rank-based neighborhood Dantzig selector. We define 6 = minfc^^^, 
B = Aniax(0*) and M = ||0*||£^. For each variable X^, define the corre- 
sponding active set Ak = {j 7^ k: O^,- 7^ 0} with the maximal cardinality 
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d = maxk\Ak\- Then we can organize 0^^^ and @*k) with respect to Ak 
as ^(fc) = {^X^^*Al) and 

n* =( ®*'^kA% \ 

'''' WaI^. ®AlAl)- 
Likewise we can partition cr^^,^ and S^^^ with respect to Ak- 

Theorem 2. Pick the A such that dX = o(l) and bnX > 12'kM . With 
probability at least 1 — exp{—coj^nX'^) , there exists Cb,B,M > depending 
on b, B and M only such that 

IIGL - 0*11,^ < 2||Cd - ^ Cb,B,MdX. 

Corollary 2. Suppose that b, B and M are all fixed. Let d'^ logp, 
and pick X such that d~^ » A = 0((logp/n)^/^). Then we have 



\\@:,-@%,=op(^d^^ and \\@:,-@%,=op[d^y 

Yuan (2010) estabhshed the rates of convergence of the neighborhood 
Dantzig selector under the £i norm, which can be directly applied to the or- 
acle neighborhood Dantzig selector under the nonparanormal model. Com- 
paring Theorem 2 and Corollary 2 to the results in Yuan (2010), we see that 
the rank-based neighborhood Dantzig selector and the oracle neighborhood 
Dantzig selector achieve the same rates of convergence. 

Dantzig selector and the lasso are closely related [Bickel, Ritov and Tsy- 
bakov (2009), James, Radchenko and Lv (2009)]. Similarly to the lasso, 
the Dantzig selector tends to over-select. Zou (2006) proposed the adaptive 
weighting idea to develop the adaptive lasso which improves the selection 
performance of the lasso and corrects its bias too. The very same idea can be 
used to improve the selection performance of Dantzig selector which leads 
to the adaptive Dantzig selector [Dicker and Lin (2009)]. We can extend the 
rank-based Dantzig selector to the rank-based adaptive Dantzig selector. 
Given adaptive weights w^, consider 

(15) ^fc"'"^ = argmin||wfeo/3||^^ subject to |Rf^,N/3 - ff^ J < Aw^, 

where o denotes the Hadamard product, and a^^xi < b^xi denotes the set 
of entrywise inequalities a, < 6j for ease of notation. In both our theoretical 

analysis and numerical implementation, we utilize the optimal solution Pf, 
of the rank-based Dantzig selector to construct the adaptive weights by 

(16) wf=('i^rv^' 
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Define fi^ = {®XaX^^X^ ^^^^ f^l = (/^A'O)- Thus the support of 
exactly coincides with that of ^(/^.)) and then it is further equivalent to 

the active set Ak- Define ^fc = PXIImin, Gk = \\{'^*A^A^)~'^h^ and Hk = 
W^AIaS^XaX'W^o. for A; = 1, 2, ... ,p. Let C, = AB\2 + A). 



Theorem 3. For each k, we pick A = as in (11) satisfying that A^ > 
and o(l) = d\d < min{||, 4Cod(/fe+2Gfe) " cfei' and pick \ = \ad as in 
(15) such that ^ > A„rf > max{i^, (CodA^ + i)^}, and o(l) = dA^^ < 

min{Amin(^^^.y^^.)i 2g^' SGfeC^Vcfc) J'" addition, we also choose = 
as m (^i^j /or eac/i k. Then with a probability at least 1 — p^exp(— cq^ • 
min{A^^, -|^A^}), for each k, the rank-based adaptive Dantzig selector finds 

. '^s.nad .'^s.nad '^s.nad^ , . , '^^s.nad. . / ^* \ 

the unique solution p/, = {PAk •>f^Al ) ^^'^ ^^?P-\PAk ) —^^&^\PAkl 

and Pai ~ ^' '^'^^ thus the rank-based neighborhood adaptive Dantzig se- 
lector is consistent for the graphical model selection. 

Corollary 3. Suppose h, B, M, il)k, Gk and Hk (l<k<p) are all 
constants. Assume that n ^ d^logp and Amin(5^^j.^j.) ^ d^(logp/n)-^/^. Pick 
the tuning parameters A^ and A^d such that ^ » A^ = 0{{logp/n)^/'^) and 
min{^ • Amm(^^^,^j,)5 ^} ^ Aa^ ^ dXd. Then with probability tending to 1, 
for each k, the rank-based adaptive Dantzig selector with = as in (16) 

'-s.nad '■s.nad '•s.nad '■s.nad 

finds the unique optimal solution p^, = [PAk ' f^Al ) with sign(/3_4^ ) = 

sign(y9^^) and Pai ~ ^' ^'^^ thus the rank-based neighborhood adaptive 
Dantzig selector is consistent for the graphical model selection. 

The sign-consistency of the adaptive Dantzig selector is similar to that 
of the adaptive lasso [van de Geer, Biihlmann and Zhou (2011)]. Based on 
Theorem 2 we construct the adaptive weights in (16) which is critical for the 
success of the rank-based adaptive Dantzig selector in the high-dimensional 
setting. It is important to mention that the rank-based adaptive Dantzig 
selector does not require the strong irrepresentable condition for the rank- 
based graphical lasso to have the sparsity recovery property. Our treatment 
of the adaptive Dantzig selector is fundamentally different from Dicker and 
Lin (2009). Dicker and Lin (2009) focused on the canonical linear regression 
model and constructed the adaptive weights as the inverse of the absolute 
values of ordinary least square estimator. Their theoretical results only hold 
in the classical fixed p setting. In our problem p can be much bigger than n. 
The choice of adaptive weights in (16) plays a critical role in establishing the 
graphical model selection consistency for the adaptive Dantzig selector under 
the high-dimensional setting where p is at a nearly exponential rate to n. 
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Our technical analysis uses some key ideas such as the strong duality and 
the complementary slackness from the linear optimization theory [Bertsimas 
and Tsitsiklis (1997), Boyd and Vandenberghe (2004)]. 

3.3. Rank-based CLIME. Compared to the graphical lasso, the CLIME 
can enjoy nice theoretical properties without assuming the irrepresentable 
condition [Cai, Liu and Luo (2011)]. This continues to hold when comparing 
the rank-based graphical lasso and the rank-based CLIME. 

Theorem 4. Recall that M = ||0*||^^. Pick a regularizing parameter A 
such that nX > IIttM. With a probability at least 1 — exp(— -j^nA^), 

l|0c-01lmax<2MA. 

Moreover, assume that cP logp, and suppose M is a fixed constant. Pick 
a regularization parameter A satisfying A = O((logp/n)^/^). Then we have 

lie:-e.||_ = Op(/!^). 

Theorem 4 is parallel to Theorem 6 in Cai, Liu and Luo (2011) which can 
be used to establish the rate of convergence of the oracle CLIME. 

To improve graphical model selection performance, Cai, Liu and Luo 
(2011) suggested an additional thresholding step by applying the element- 
wise hard-thresholding rule to 0^, 

where t„ > 2MA is the threshold, and A is given in Theorem 4. Here we show 
that consistent graphical model selection can be achieved by an adaptive 
version of the rank-based CLIME. Given an adaptive weight matrix W we 
define the rank-based adaptive CLIME as follows: 

(18) 0^^ = argmin II Wo0||i subject to |R''0 - I| < AW, 



where Apxp < ^pxp is a simplified expression for the set of inequalities aij < 
bij (for all 1 <i,j <p). Write W = (wi, . . . ,Wp). By Lemma 1 in Cai, Liu 
and Luo (2011) the above linear programming problem in (18) is exactly 
equivalent to p vector minimization subproblems, 

= argmin \\wk o d\\£^ subject to |R*0 — e^] < Xw^. 

emp 

In both our theory and implementation, we utilize the rank-based CLIME's 
optimal solution 0^, to construct an adaptive weight matrix W by 

(19) w^=(i0:i+i) 
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We now prove the graphical model selection consistency of the rank-based 
adaptive CLIME. Denote &* as{9l,..., 6*), and define Ak = AkU{k}. Then 
we can organize 61 and S* with respect to Ak and A^. For k = 1,2, . . . ,p, 
define G, = mX_,X'\^e^ = \\^\aS^XaX'\\^^- 

Theorem 5. Recall V'mm = min^j ,,)g_4 For each k pick A = Ac as 
(12) ^^^h ihat min{%f, -^}>\c>'^ and dX, = 

o(l), and we further pick the regularization parameter X = Xac as in ( 18) sat- 
isfying that ^>Xac> max{12^/n, (2MAc + ^)^^^#^} and o(l) = dA^c < 



"^WAmin(S^,^J, 4G,(Vw+G,) ^- choose W = W 



as 



in (19). With a probability at least 1 — exp(— conminjA^c, jg^Ac}), the 
rank-based adaptive CLIME's optimal solution is sign consistent, that 
is, sign(6'|j"'^) = sign{9*j) for {i,j) e A and sign(6'fj°'^) = for {i,j) € A'^. 

Corollary 4. Suppose V'miii! M, Gk and (1 ^ k < p) are all con- 
stants. Assume that n ^ d^logp and Amm(S*T t ) ^ d{[ogp/nY/'^ . Pick the 

regularization parameters Ac and Xac such that \>Xc = 0{{\ogp / n)^^"^) , and 
min{Amin(X;*T 7 )M 3} > Aac > Ac. Let W = W= as in (19). Then with 
probability tending to 1, the rank-based adaptive CLIME's optimal solution 
&ac ^■^ ^^9''^ consistent for the graphical model selection, that is, sign(0|j"'^) = 
sign(6'*^) for {i,j) e A and sign^ff'') = for {i,j) G A"". 

The nice theoretical property of the rank-based CLIME allows us to con- 
struct the adaptive weights in (19), which is critical for establishing the 
graphical model selection consistency for the rank-based adaptive CLIME 
estimator in the high-dimensional setting without the strong ir-representable 
condition. 



4. Numerical properties. In this section we present both simulation stud- 
ies and real examples to demonstrate the finite sample performance of the 
proposed rank-based estimators. 

4.1. Monte Carlo simulations. In the simulation study, we consider both 
Gaussian data and nonparanormal data. In models 1-4 we draw n indepen- 
dent samples from Np{0,'S) with four different ©: 

Model 1: On = 1 and = 0.5; 
Model 2: On = 1, ^,,,+1 = 0.4 and 9i,i+2 = ^i,i+3 = 0.2; 
Model 3: Randomly choose 16 nodes to be the hub nodes in 0, and each 
of them connects with 5 distinct nodes with @ij = 0.2. Elements, not asso- 
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Table 2 

List of all estimators in the numercial study 



Notation 



Details 



GLASSO 
MB 

MB.au (or MB.ai) 
NDS 

NDS.au (or NDS.ai) 

CLIME 

LLW 

R-GLASSO 
R-NDS 

R-NDS.au (or R-NDS.ai) 
R-NADS 

R-NADS.au (or R-NADS.ai) 

R-CLIME 

R-ACLIME 



Penalized likelihood estimation via graphical lasso 
Neighborhood lasso [Meinshausen and Biihlmann (2006)] 
MB + aggregation by union (or by intersection) 
Neighborhood selection via Dantzig selector 
NDS + aggregation by union (or by intersection) 
Constrained £i minimization estimator 
The "plug-in" extension of GLASSO [Liu, Lafferty and 
Wasserman (2009)] 

Proposed rank-based extension of GLASSO 
Proposed rank-based extension of NDS 
R-NDS -I- aggregation by union (or by intersection) 
Proposed rank-based adaptive extension of R-NDS 
R-NADS + aggregation by union (or by intersection) 
Proposed rank-based extension of CLIME 
Proposed rank-based adaptive extension of CLIME 



ciated with hub nodes, are set as in 0. The diagonal element a is chosen 
similarly as that in the previous model. 

Model 4: = 0o + crl, where 0o is a zero-diagonal symmetric matrix. 
Each ofF-diagonal element &oij independently follows a point mass 0.99(5o + 
0.01(5o.2, and the diagonal element a is set to be the absolute value of the 
minimal negative eigenvalue of 0o to ensure the semi-positive-definiteness 
of 0. 

In models lb-4b we first generate n independent data from Np{0, S) and 
then transfer the normal data using transformation functions 

S—[fl 1/2 5/3 5/4 1/5 5 /l 1/2 ' /s 5/4 ' /s 5 • • •] ; 

where fi{x) = x, /2(x) = log(x), f^ix) = x^ , /^{x) = log(Y3^) and f^^x) = 
/2(3;)/{x<-i} + /i(2;)/{-i<x<i} + ifiix - 1) + l)/{a;>i}. In aU cases we let 
n = 300 and p= 100. 

Table 2 summarizes all the estimators investigated in our study. For each 
estimator, the tuning parameter is chosen by cross-validation. Estimation 
accuracy is measured by the average matrix ^2-iiorm over 100 independent 
replications, and selection accuracy is evaluated by the average false posi- 
tive/negative. 

The simulation results are summarized in Tables 3-6. First of all, we 
can see that the graphical lasso, neighborhood selection and CLIIME do not 
have satisfactory performance under models lb-4b due to the lack of ability 
to handle nonnormality. Second, the three rank-based estimators perform 
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Table 3 

Estimation performance in the Gaussian model. Estimation accuracy is measured by the 
matrix i2-norm with standard errors in the bracket 



Method 


Model 1 


Model 2 


Model 3 


Model 4 


GLASSO 


0.74 


1.23 


0.67 


0.63 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


LLW 


0.84 


1.28 


0.68 


0.67 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


R-GLASSO 


0.81 


1.30 


0.64 


0.70 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


NDS 


0.78 


1.25 


0.61 


0.57 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


R-NDS 


0.81 


1.28 


0.63 


0.62 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


CLIME 


0.71 


1.19 


0.54 


0.59 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


R-CLIME 


0.79 


1.27 


0.58 


0.61 




(0.01) 


(0.02) 


(0.01) 


(0.01) 



similarly to their oracle counterparts. Note that in models lb-4b the oracle 
graphical lasso, the oracle neighborhood Dantzig and the oracle CLIME are 
actually the graphical lasso, the neighborhood Dantzig and the CLIME in 
models 1-4. In terms of precision matrix estimation the rank-based CLIME 



Table 4 

Estimation performance in the nonparanormal model. Estimation accuracy is measured 
by the matrix I2 -norm with standard errors in the bracket 



Method 


Model lb 


Model 2b 


Model 3b 


Model 4b 


GLASSO 


1.77 


2.68 


1.31 


1.28 




(0.01) 


(0.06) 


(0.02) 


(0.01) 


LLW 


0.84 


1.28 


0.68 


0.67 


(0.01) 


(0.01) 


(0.01) 


(0.01) 


R-GLASSO 


0.81 


1.30 


0.64 


0.70 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


NDS 


1.41 


2.42 


1.16 


1.13 




(0.01) 


(0.03) 


(0.02) 


(0.02) 


R-NDS 


0.81 


1.28 


0.63 


0.62 




(0.01) 


(0.02) 


(0.01) 


(0.01) 


GLIME 


1.22 


2.51 


1.24 


1.03 




(0.02) 


(0.03) 


(0.02) 


(0.01) 


R-CLIME 


0.79 


1.27 


0.58 


0.61 




(0.01) 


(0.02) 


(0.01) 


(0.01) 
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Table 5 

Selection performance in the Gaussian model. Selection accuracy is measured by counts 
of false negative (#FN) or false positive (#FP) with standard errors in the bracket 



Model 1 Model 2 Model 3 Model 4 





#FN 




#FN 


#FP 




#FP 


#FN 


#FP 


GLASSO 


0.00 


521.21 


263.16 


45.21 


0.00 


114.48 


0.03 


35.33 




(0.00) 


(1.91) 


(0.58) 


(1.26) 


(0.00) 


(1.94) 


(0.02) 


(1.29) 


LLW 


0.00 


518.84 


264.18 


43.45 


0.00 


116.02 


0.04 


35.08 




(0.00) 


(1.91) 


(0.56) 


(1.34) 


(0.00) 


(2.01) 


(0.02) 


(1.19) 


R-GLASSO 


0.00 


505.77 


264.86 


48.01 


0.00 


114.89 


0.03 


37.13 




(0.00) 


(1.67) 


(0.57) 


(1.57) 


(0.00) 


(2.17) 


(0.02) 


(1.07) 


MB.au 


0.00 


154.81 


232.99 


89.61 


0.00 


44.03 


0.02 


41.22 




(0.00) 


(1.29) 


(0.74) 


(1.37) 


(0.00) 


(0.81) 


(0.01) 


(0.77) 


R-NDS.au 


0.00 


163.78 


230.77 


118.46 


0.00 


69.16 


0.03 


49.31 




(0.00) 


(1.27) 


(0.79) 


(2.12) 


(0.00) 


(0.92) 


(0.02) 


(0.88) 


R-NADS.au 


0.00 


80.90 


218.69 


83.62 


0.00 


60.75 


0.03 


48.59 




(0.00) 


(2.52) 


(1.02) 


(2.90) 


(0.00) 


(1.04) 


(0.02) 


(0.92) 


MB.ai 


0.00 


30.62 


260.76 


21.79 


0.00 


9.42 


0.04 


9.58 




(0.00) 


(0.53) 


(0.55) 


(0.60) 


(0.00) 


(0.31) 


(0.02) 


(0.34) 


R-NDS.ai 


0.00 


38.62 


259.66 


29.34 


0.00 


11.52 


0.07 


11.87 




(0.00) 


(0.52) 


(0.61) 


(0.68) 


(0.00) 


(0.40) 


(0.04) 


(0.40) 


R-NADS.ai 


0.06 


14.92 


256.16 


24.62 


0.00 


10.54 


0.08 


10.98 




(0.02) 


(0.11) 


(0.68) 


(0.79) 


(0.00) 


(0.36) 


(0.04) 


(0.38) 


CLIME 


0.00 


143.88 


263.77 


34.71 


0.00 


32.53 


0.02 


32.59 




(0.00) 


(0.10) 


(0.57) 


(1.42) 


(0.00) 


(0.78) 


(0.01) 


(1.17) 


R- CLIME 


0.00 


148.24 


265.81 


38.23 


0.00 


37.44 


0.04 


36.56 




(0.01) 


(3.11) 


(1.22) 


(2.55) 


(0.05) 


(2.45) 


(0.33) 


(1.18) 


R-ACLIME 


0.00 


82.53 


264.74 


34.52 


0.00 


29.83 


0.07 


31.09 




(0.00) 


(0.13) 


(0.63) 


(2.60) 


(0.00) 


(0.61) 


(0.03) 


(1.02) 



seems to be the best, while the rank-based neighborhood adaptive Dantzig 
selector has the best graphical model selection performance. We have also 
obtained the simulation results under the matrix £i-norm. The conclusions 
stay the same. For space consideration we leave these £i-norm results to the 
technical report version of this paper. 

4.2. Applications to gene expression genomics. We illustrate our pro- 
posed rank-based estimators on a real data set to recover the isoprenoid ge- 
netic regulatory network in Arabidposis thaliana [Wille et al. (2004)]. This 
dataset contains the gene expression measurements of 39 genes (exclud- 
ing protein GGPPS7 in the MEP pathway) assayed on n = 118 Affymetrix 
GeneChip microarrays. 

We used seven estimators (GLASSO, MB, CLIME, LLW, R-GLASSO, 
R-NADS and R-ACLIME) to reconstruct the regulatory network. The first 
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Table 6 

Selection performance in the nonparanormal model. Selection accuracy 
IS measured by counts of false negative (#FN) or false positive (#FP) 
with standard errors in the bracket 



Model lb Model 2b Model 3b Model 4b 





#FN 


#rp 


#FN 


#FP 


#FN 


#FP 


#FN 


#FP 


GLASSO 


58.81 


470.05 


286.40 


44.70 


9.82 


134.70 


8.06 


44.20 




(0.35) 


(5.30) 


(0.74) 


(1.48) 


(0.41) 


(2.08) 


(0.36) 


(1.33) 


LLW 


0.00 


518.84 


264.18 


43.45 


0.00 


116.02 


0.04 


35.08 




(0.00) 


(1.91) 


(0.56) 


(1.34) 


(0.00) 


(2.01) 


(0.02) 


(1.19) 


R-GLASSO 


0.00 


505.77 


264.86 


48.01 


0.00 


114.89 


0.03 


37.13 




(0.00) 


(1.67) 


(0.57) 


(1.57) 


(0.00) 


(2.17) 


(0.02) 


(1.07) 


MB.au 


56.28 


472.86 


283.15 


61.69 


12.99 


99.10 


8.28 


57.65 




(0.26) 


(4.11) 


(0.64) 


(1.04) 


(0.46) 


(1.31) 


(0.36) 


(0.90) 


R-NDS.au 


0.00 


163.78 


230.77 


118.46 


0.00 


69.16 


0.03 


49.31 




(0.00) 


(1.27) 


(0.79) 


(2.12) 


(0.00) 


(0.92) 


(0.02) 


(0.88) 


R-NADS.au 


0.00 


80.90 


218.69 


83.62 


0.00 


60.75 


0.03 


48.59 




(0.00) 


(2.52) 


(1.02) 


(2.90) 


(0.00) 


(1.04) 


(0.02) 


(0.92) 


MB.ai 


68.68 


197.44 


304.71 


22.72 


16.88 


50.25 


11.67 


23.88 




(0.16) 


(1.12) 


(0.61) 


(0.56) 


(0.52) 


(0.92) 


(0.42) 


(0.50) 


R-NDS.ai 


0.00 


38.62 


259.66 


29.34 


0.00 


11.52 


0.08 


11.87 




(0.00) 


(0.52) 


(0.61) 


(0.68) 


(0.00) 


(0.40) 


(0.04) 


(0.40) 


R-NADS.ai 


0.06 


14.92 


256.16 


24.62 


0.00 


10.54 


0.08 


10.98 




(0.02) 


(0.11) 


(0.68) 


(0.79) 


(0.00) 


(0.36) 


(0.04) 


(0.38) 


CLIME 


47.14 


385.95 


286.16 


45.25 


10.02 


123.31 


7.87 


46.38 




(0.39) 


(1.99) 


(0.74) 


(1.45) 


(0.41) 


(2.11) 


(0.36) 


(1.34) 


R-CLIME 


0.00 


148.24 


265.81 


38.23 


0.00 


37.44 


0.04 


36.56 




(0.01) 


(3.11) 


(1.22) 


(2.55) 


(0.05) 


(2.45) 


(0.33) 


(1.18) 


R-ACLIME 


0.00 


82.53 


264.74 


34.52 


0.00 


29.83 


0.07 


31.09 




(0.00) 


(0.13) 


(0.63) 


(2.60) 


(0.00) 


(0.61) 


(0.03) 


(1.02) 



three estimators are performed after taking the log-transformation of the 
original data, and the other four estimators are directly applied to the orig- 
inal data. To be more conservative, we only considered the integration by 
union for the neighborhood selection procedures. We generated 100 inde- 
pendent Bootstrap samples and computed the frequency of each edge being 
selected by each estimator. The final model by each method only includes 
edges selected by at least 80 times over 100 Bootstrap samples. We report 
the number of selected edges by each estimator in Table 7. The rank-based 
graphical lasso performs similarly to the LLW method. The rank-based adap- 
tive CLIME produces the sparsest graphs. We also compared pairwise in- 
tersections of the selected edges among different estimators. More than 70% 
of the selected edges by GLASSO, MB or CLIME turn out to be validated 
by both LLW and R-GLASSO, and more than 40% of the selected edges 
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Table 7 

The isoprenoid genetic regulatory network: counts of stable edges 





GLASSO 


Neighborhood LASSO 


CLIME 




# of stable edges 


100 


101 


67 






LLW 


R-GLASSO 


R-NADS 


R-ACLIME 


# of stable edges 


87 


88 


50 


52 



by GLASSO, MB or CLIME are justified by R-NADS and R-ACLIME. The 
selected models support the biological arguments that the interactions be- 
tween the pathways do exist although they operate independently under 
normal conditions [Laule et al. (2003), Rodriguez-Concepcion et al. (2004)]. 

5. Discussion. Using ranks of the raw data for statistical inference is 
a powerful and elegant idea in the nonparametric statistics literature; see 
Lehmann (1998) for detailed treatment and discussion. Some classical rank- 
based statistical methods include Friedman's test in analysis of variance and 
Wilcoxon signed-rank test. This work is devoted to the rank-based estima- 
tion of of the nonparanormal model under a strong sparsity assumption 
that has only a few nonzero entries, and our results show that rank- 
based estimation is still powerful and elegant in the new setting of high- 
dimensional nonparametric graphical modeling. In a separate paper, Xue 
and Zou (2011a) also studied the problem of optimal estimation of S of the 
nonparanormal model under a weak sparsity assumption that S belongs to 
some weak iq ball and showed that a rank-based thresholding estimator is 
adaptive minimax optimal under the matrix £i norm and £2 norm. 

APPENDIX: TECHNICAL PROOFS 

Proof of Theorem 1. Using Lemma 3 in Ravikumar et al. (2011), 
&g >- is uniquely characterized by the sub-differential optimality condition 
that — {&g)~^ + AZ = 0, where Z is the sub-differential with respect to 
&g. Define the "oracle" estimator 0^ by 

&l= argmin -logdet(0) -htr(s''0) A V |%|. 

Then we can construct Z to satisfy that R* — (0^)~^ -|- AZ = 0. As in Raviku- 
mar et al. (2011) the rest of the proof depends on a exponential- type con- 
centration bound concerning the accuracy of the sample estimator of the 
correlation matrix under the entry-wise £oo bound. Our Lemma 1 fulfills 
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that role. With Lemma 1, Theorem 1 can be proved by following the line of 
the proof in Ravikumar et al. (2011). For the sake of space, we move the rest 
of proof to the technical report version of this paper [Xue and Zou (2011b)]. 

We now prove Lemma 1. First, Spearman's rank correlation fij can be 
written in terms of the Hoeffding decomposition [Hoeffding (1948)] 

n-2 3 
n + 1 n+1 



(20) rij = ^ ^ ^ , 1 dij, 



where dij = Efc^^z sign(xfci - xu) ■ sign(xfcj - xij), and 

(21) Uij = — — — Y] sign{xki - xii) ■ sign(xfcj - Xmj)- 

n{n—l)(n — 2) ^ ■' 

Direct calculation yields that E{uij) = ^sm~^{^) [Kendah (1948)]. Then 
we can obtain that cjjj = 2sm{^E{uij)). By definition ffj = 2sin(^fjj). Note 
that 2sin(^-) is a Lipschitz function with the Lipschitz constant tt/3. Then 
we have 

Pr(|f|j- - aij\ >e)< Fillhj - E{uij)\ > 



3 J.. 3 



Applying (20) and (21) yields rij - E{uij) = Uij - E{uij) + i^dij - i^Uij- 
Note \uij\ < 3 and \dij\ < 1. Hence, \uij\ < ^(?^ + 1) and \dij\ < ^(n + 1) 
always hold provided that n > 127r/e, which are satisfied by the assumption 
in Lemma 1. For such chosen n, we have 

'^^\ ^ -n f i TP/ M ^ A 

2^/' 



Fr(^\rij - E{uij)\ > ^) < PrJ^juij - E{uij)\ > — I . 



Finally, we observe that function of independent samples (xi, . . . , x„). 

Now we make a claim that if we replace the tth sample by some x^, the 
change in Uij will be bounded as 

15 

(22) sup |tiij(xi,...,X„) -Mjj(xi,...,Xt_i,Xt,Xt+i,...,X„)| < — . 

Xl,...,X„,Xt 

Then we can apply the McDiarmid's inequality [McDiarmid (1989)] to con- 
clude the desired concentration bound for some absolute constant cq > 0, 



Pr(|rfj. - aij\ > e) < Pr \^\uij - E{uij)\ < 2exp(-cone^). 

Now it remains to verify (22) to complete the proof of Lemma 1. We 
provide a brief proof for this claim. Assume that xj = {xu, . . . jXpt)' is re- 
placed by xj = (xij, . . . , Xpj)', and we want to prove that the change of 
Uij is at most 15/n. Without loss of generality we may assume that rij = 
#{s:sign(xti — x^i) = — sign(xfj — Xsi),s 7^ t} and also assume that rij = 
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#{s : sign(xtj — Xgj) = — sign{xtj — Xsj),s t}. Then we have 

\Uij{xi, . . . ,X„) - Uij{xi, . . .,Xt_i,Xt,Xt+i, . . .,x„)| 

< ^ {sign(xfei - xti) - sign(xfci - xu)} • sign(xfcj - Xmj) 
+ ^ (sign(j;fcj - xtj) - sign(xfcj - Xtj)) ■ sign(j;fci - x^i) 
+ ^ (sign(a;ti - xu) ■ sign(xfj - x^j) - sign(xtj - x^j) 

X sign(xtj - Xmj )) 



n(n-l)(n-2) 



^ 3 • 2 • [ni{n — 2) + nj{n — 2) + nj{n — 1 — rii) + ni(n — 1 — rij)] 



n{n — l)(n — 2) 



12 / 1 

<— 1 + T 



n \ 4 (n - l)(n - 2) 



15 

<— , 

n 



where the third inequahty holds if and only if ni=nj = n — ^. □ 

Proof of Theorem 2. For space of consideration, we only show the 
sketch of the proof, and the detailed proof is relegated to the supplementary 
file [Xue and Zou (2012)] and also the technical report version of this paper 
[Xue and Zou (2011b)]. We begin with an important observation that we 
only need to prove the risk bound for 0^^ because 

WKd - ^ \\®nd - KdK + \\®nd - ^ 2||0L - 0*11^,- 

To bound the difference between &nd ®* under the matrix £i-norm, we 
only need to bound l^^'^"^ — O^j^] and — 0*f,-^ \\£^ for each k = I, . . . ,p. 

To this end, we consider the probability event {||R,* — 5j*||max 

< jgA}, and 

under this event, we can show that for k = 1, . . . ,p, 

(23) ||R^,)/3^-r^,)||,^<A and ||^r' " /3fclk< C^o^^A, 



where Cq is some quantity depending on b, B and M only. 
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Now we can use (23) to further bound \Ol:J}'^ — under the same event 



To this end, we first derive an upper bound for \{0^^'^) ^ — ^| as 

(24) |(^^n~' - {Oluy'\ ^ (l + • + ll^r' - (ilK- 

Notice that \ei^'-ei,\ = Wu^'^r' - {ei,r\ |^-"'^| • \ei,\ and also |^-"'^| < 
Hk'^ - ^kk\ + KkV Then |^^^"'^ - 0*J can be upper bounded by 



\as.nd Q* I ^ 
Wkk -^kk\^ 



\ ( as.nd\~l (a* \Q* |2 

nd a* \ ^ \^ kk ) y"kk) I ' r fcfcl 

i-mk'y'-if^ikr'i 



'kk I \"kk/ I \"kk\ 

B\l + b/M){M/b)X + CpdX] 
- 1 - B[{1 + b/M){M/b)X + CodX] ■ 

Since dX = o(l), we denote the right-hand side as CidX for some Ci > 0. 
Next, we can further obtain a bound for ||^(/;) — 

Il^(fe) - < IK'^fcfc''^ - ^fcfc)^/fc llfi + II^Ll^fe - PDWii 

<CidX-b-^M + B-CodX. 

Thus we can combine (24) and (25) to derive the desired upper bound under 
the same event. This completes the proof of Theorem 2. □ 

Proof of Theorem 3. Throughout the proof, we consider the event 

(26) |||R^-S*|U,,<min(A,,,AA, 

For ease of notation, define = Aq and Xad = Xi. We focus on the proof of 
the sign consistency of p^. in the sequel. 

Under event (26), R^^^^^ is always positive-definite. To see this, the 

Weyl's inequality yields Amin(R^,^J + Amax(R^,^^, - Sl^.^J > Amin(I];4^^J, 
and then we can bound the minimal eigenvalue of R^^_^^ , 



A„.in(R%^J > X^^,{^Xau)- W'^Xa.-^Xa, \\f > Ai(d- Vdid^)) > 0. 

For each k we introduce the dual variables cx^ = {a^)j^j^ G M^""*^ and o;^ = 
{aj)j^k £ IR!|r^- Then the Lagrange dual function is defined as 

L{f3- a+, «,T) = ||wf o + (R^,)/3 - f-^,) - Aiwf )^a+ 
+ (-R^,)/3 + r^,)-Aiw^)^a-, 
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where o denotes the Hadamard product. Due to the strong duahty of hnear 
programming [Boyd and Vandenberghe (2004)], the complementary slack- 
ness condition holds for the primal problem with respect to any primal 
and dual solution pair (/3, a^, q^), which implies that q^[(R*^^/3 — r^^pj — 

XiWj] = and aj [— (R^^,-|/3 — r^^^j — XiWj] = for any j / k. Observe that 

only one of and aJ can be zero since only one of — = ^I'^j 

and — r^^^j = —XiWj can hold indeed, and thus we can uniquely 

define ct^ = — . Then we can rewrite the Lagrange dual function as 

L(/3; Qfc) = (w^ o sign(/3) - Rff^-^a^fp - Ai || o a^W^^ - a^r^fc). 
By the Lagrange duality, the dual problem of (15) is 

m^x;^ -Ai||wf o a^H^^ - (afc,f(^)) subject to |R(fc)Q:fc| < w^. 

Now we shall construct an optimal primal and dual solution pair [(ij^^OLk) 
to the rank-based adaptive Dantzig selector. In addition, we show that 
{Pj^,ak) is actually the unique solution pair to the rank-based adaptive 
Dantzig selector, and Pf, is exactly supported in the true active set Ak- 
To this end, we construct {(3f^,6LiS) as = (0;^^^. , ) = (q!_4^,0) and = 

CpAk^pAl) = C^Ak^^) where a^, = -(R^^^^^J-^w^^ osign(/3;^J and = 

(R^,^J"'(r% + Aiw^^ osign(«^J). _ 

In what follows, we first show that (/3^,,q;/c) satisfies four optimality con- 
ditions, and then we will use these four optimality conditions to prove that 
{Pj.,ak) is indeed a unique optimal solution pair. The four optimality con- 
ditions are stated as follows: 

(27) ^Xa.Pa, - r^. = ^1^1 ° sign(a^J, 

(28) ^Xa,^A, = -wl o sign(^^J, 

(29) |R^c^^^^^-f^.|<Aiw^c, 

(30) |R^c_4^a^J <w^c, 

where (27) and (29) are primal constraints, and (28) and (30) are dual 
constraints. 

Note (27) can be easily verified by substituting q:_4j. and Under 
(26), we can derive upper bounds for Ki = ||(R^ _^ )~^ — (^*A^Ak^~^\\ioo^ 
and K, = ||R^c.,,(<^J-^ - ^AIA^^XaJ-%S- 

Note K, = {RXaJ-' ■ (^Xa, - ^Xa,) ■ (S^A.)"'' then we have 

Kl<\\{'RAkAk) ^\\e^-\\^AkAk-'^*AkAje^-\\i'^AkAk) 
<d\iGk{Gk + Ki). 
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Some simple calculation shows Ki < i^^x^Gk ' other hand, 

< iW^AlAk -^AlAkWl^+HkWMAkAk - ^AkAuWlJ ' Wi^AuA^) ^\\t^ 
<d\i{l + Hk){Gk + Ki) 

^ d\iGk{l + Hk) 
- 1 - dXiGk ■ 

Under probability event (26), we claim about that 
(32) l|wllL<^^^.-dG.-l. 

This claim is very useful to prove the other three optimality conditions (28), 
(29) and (30), and their proofs will be provided later. 

Now we are ready to prove (28), (29) and (30) for the solution pair 
(/S^jjQfc). To prove (28), it is equivalent to show the sign consistency that 
sign(/3X) = sign(^^J since we have R^^_4^q^, = -wj^^ o sign{(3*j(J if we 
plug in otAk to its left-hand side. Recall that = (5]^^_4^)~^(t^^ , and 
then we consider the difference between and , 

Pa, - PX = i^XA^r'i^X - + ° sign(«A.)) 

- mxAj-' - i^XAj~')-x- 

Then we apply the triangle inequality to obtain an upper bound, 

- PX Wi^ < (Gk + /^i)(Ai + Ai \\wX + Ki IkX 11.^ 

- l-dX^Gk^^ + ""^-^^ "^-^ + 

< W^Ak llmin' 

where the last inequality obviously holds by claim (31). Then by the above 
upper bound, sign(/3^^) = sign(/3^^) will be immediately satisfied. 
Next, we can easily obtain (30) via the triangular inequality 

W^AlAk^AkWi^ < W^AlAkC^AkAk) ^^^Jl^oo 



<{Hj, + K2)\\v,X 



d 

00 
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dXiGk + Hk d II 
- 1-dAiGfc " 

< ll'^llllmin' 

where the last inequahty can be easily shown by combining (31) and (32). 
Now it remains to prove (29). Using the facts that = and 5]*0* = I, 

simple calculation yields that 5]^c_4^(5]^^_4^)~^<t^^ = <t^c . Then we can 

rewrite the left-hand side of (29) as 



= ^AIaS^XaX^^'^X + ^1^1 ° sign(a^J) - f^c 



= ^AIaS^Xa,) \^A, - "^X + ^1^1 ° sign(a^J) 

+ C^AlAki^AkAk) ^ - '^*AlAk('^AkAj ^)^*Ak + (^*Al " ^A^)- 
Again we apply the triangle inequality to obtain an upper bound as follows: 



\^A%Ak(^Ak 



A?. 



loo 



< {Hk + K2)(Ai + Ai ||w^^ 11^) + K,\\aX IL + 

dAf Gfe + Ai-fffc ^ dAiGfc(l + i/fc) , < 

< Aiiiw^c mjj^, 

where the last inequality is due to (31) and (32). 

So far, the four optimality conditions have been verified for {(3j^,ak). 
In the sequel, we shall show that (3). is indeed a unique optimal solution. 
First, due to (29) and (30), {f3f.,ak) are feasible solutions to the primal 
and dual problems, respectively. Then (27) and (28) show that {(3i.,dik) 
satisfy the complementary-slackness conditions for both the primal and the 
dual problems. Thus, {Pj^,ai^) are optimal solutions to these problems by 
Theorem 4.5 in Bertsimas and Tsitsiklis (1997). Now it remains to show 
the uniqueness. Suppose there exists another optimal solution (3)., and we 
have ||w^ o ^^^ll^j = ||w^ o /3^, ||^^ . Let Tk denote the support of 0f,, and then 
f3f. = (/9p^ , 0) . By the strong duality we have 

= inf L{(3; ,a^) 
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<L{pf,;a+,a^) 

Thus L{0i^;a'^ ,a^) = ||w^ o /3;j||£j , which immediately imphes that the 



complementary slackness condition holds for the primal problem, that is, 
i%)Pk - - Aiwf )^a+ = and (-R^,)^^ + r^,) - Xi^ffc^- = 0. Now 

let =max(/3^,0) and = min(/3;j, 0). Besides, we can similarly show 
that the complementary slackness condition also holds for the dual problem, 

that is, (R(fc)Q;fc - ^tfht = and (-R^^)Qfc - )^/3fc = 0. Notice that 
oi-Ak 7^ and ct^c = by definition, and then we have 

(33) R%r,^r, - ^A- = ^i^l ° sign(«A)> 

(34) M,A, « A = - wf^. ° sign(^r, ) • 

Observe that for any j € but j ^ Ak, R|_4^Q:yij. = — t(;^sign(/3j) in (34) 
cannot hold since it contradicts with (30). Then it is easy to see that Tk C Ak 
obviously holds, which immediately implies that /3_4^ and /3p^ satisfy the 
same optimality condition (27). Thus the uniqueness follows from (27), (33) 
and the nonsingularity of R^^^.^ • 

Now it remains to verify the cfaims (31) and (32) under event (26). Under 
the event HR'* — S*||max < bXo/M, it has been shown in Theorem 2 that for 
some Co = 4i?^(2 + jj) > 0, we have ||/3^ — P*k\\ei < CodX^. Then we can 



derive a lower bound for llw'^c llmim 



\^Al llmin - ' ' > 



maxjg^ IPf'^'^l + 1/n ^odAo + 1/n ' 



which immediately yields the desired lower bound by noting that 

GkdXi + Hk J^rGkd Hk^h ^ 1 

2Gk • Ai • + T^G^dX, - + {^k + 2Gk)-d+2< ^^^^^ ^ , 

where both inequalities follow from the proper choices of tuning parameters 
Aq and Ai as stated in Theorem 3. On the other hand, 

l-Gk-d\i i>k , , , ^ ^ , 1 ^ 
-il)k - dGk - 1 > ttt; — T (V-'fc + Gfe) • d - 1 > 



2Gk ■ Ai 2Gk ■ Ai 4Gfc • Ai 

where the last inequality follows from the proper choice of Ai as stated in 
Theorem 3. Likewise we can prove the second claim (32) by noticing that 

„ „ 1 1 2 V'fc 

w J < < < — < — — — , 

' min^eA |/3|-"'^| V'fc - CodAo V'fc 4Gfc • Ai 

where we use facts that Vfe > 2Co(iAo and ip'j. > SGkXi. The two claims are 
proved, which completes the proof of Theorem 3. □ 
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Proof of Theorem 4. To bound the difference between 0^ and 0* 
under the entry-wise £oo-iiorm, we consider the event {||R* — 5]||max ^ X/M}. 
First, we show that 0* is always a feasible solution under the above event, 

||R^0* - IL,,. < ||(R^ - S*)0*|U,, < ||R^ - SlU,, • ||0*||,^ < A. 

Note that 0^ is the optimal solution, and then HR''©^ — I||max < X obviously 
holds. Moreover, it is easy to see that by definition ||0g||£^ < ||0*||£j always 
holds. Now we can obtain the desired bound under the entry-wise £(X)-norm. 

11©: -01Uax< 110*11., •||S*®c-I|Uax 

= Af.||(S*-R^)0%R^0:-I|U,, 

< M . IIS* - R^IU,, . 110:11,^ + M . IIR^©: - I|U,, 

< A - ||0*||^^ +MA 

= 2MA. □ 

Proof of Theorem 5. The techniques we use are similar to these for 
the proof of Theorem 3. The detailed proof of Theorem 5 is relegated to the 
supplementary material [Xue and Zou (2012)] and also the technical report 
version of this paper [Xue and Zou (2011b)] for the sake of space. □ 

Acknowledgments. We thank the Editor, the Associate Editor and three 
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SUPPLEMENTARY MATERIAL 

Supplement material for "Regularized rank-based estimation of high- 
dimensional nonparanormal graphical models" 

(DOI: 10.1214/12-AOS1041SUPP; .pdf). In this supplementary note, we 
give the complete proofs of Theorems 2 and 5. 
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